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Statement  of  the  Problem  Studied 


In  this  project,  we  focused  on  developing  new  statistical  models  for  characterizing  the  dynamic 
evolution  of  networks  and  detecting  changes  deviating  from  the  normal  evolving  trajectory. 
Modeling  the  dynamics  of  evolving  networks  and  detecting  abnormal  changes  is  of  great 
interest  in  many  domains.  Figure  1  gives  examples  of  three  such  domains,  including  (a)  brain 
networks  in  developmental  studies,  (b)  social  networks  in  security  surveillance,  and  (c)  wireless 
communication  networks  in  quality  of  service  monitoring  and  intrusion  detection.  What 
complicates  the  problem  even  more  is  that  the  network  measurement  data  is  usually  of  mixed- 
type  and  multi-dimension.  For  example,  brain  networks  may  be  measured  by  functional  or 
structural  neuroimaging  techniques;  social  interaction  may  be  through  multi-type  social  media 
(facebook,  twitter,  Linkedln,  etc);  wireless  communication  may  be  measured  by  logic  link, 
connectivity,  or  connection  strength.  Despite  the  interest  and  importance  of  the  problem,  the 
fundamental  methodological  development  in  statistical  modeling  has  been  lacking.  In  this 
project,  we  studied  the  problem  through  three  specific  aims,  including: 

Aim  1:  Multi-type  multi-dimension  network  data  fusion 

Aim  2:  Dynamic  modeling  of  evolving  networks 

Aim  3:  Anomaly  detection  in  evolving  networks 

Table  1  summarizes  the  results  of  our  literature  review,  which  shows  that  research  for 
addressing  these  specific  aims  is  lacking  and  therefore  this  project  is  timely  and  impactful. 

Table  1:  Specific  aims  of  this  project  and  limitations  of  related  existing  work 


Aims 

Limitations  of  existing  work  (references  omitted  due  to  space  limit) 

Multi-type  multi¬ 
dimension  network  data 
fusion 

Latent  variable  models  focus  on  “variables”  not  “networks”. 

Relational  learning  focuses  little  on  fusion  of  multi-type  relations 
(qualitative  vs.  quantitative,  different  distributions  or  uncertainties). 

Dynamic  modeling  of 
evolving  networks 

Dynamic  or  time-varying  graphical  models  either  assume  the 
dynamics  to  be  Markovian  or  focus  on  dynamic  attribute  data  but  not 
relational  data. 

Anomaly  detection  in 
evolving  networks 

Statistical  Process  Control  (SPC)  methods  focus  on  monitoring  of 
“variables”  not  “networks”;  in  network  settings,  SPC  has  only  been 
limitedly  used  for  network  centrality  measures. 
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Figure  1:  Modeling  the  dynamics  and  changes  in  evolving  networks  is  of  great  interest  in  many 
domains:  (a)  brain  development;  (b)  social  interaction;  (c)  wireless  communication 


Summary  of  the  Most  Important  Results 


The  important  outcomes  from  this  project  are  summarized  in  three  sections:  research, 
collaboration,  and  education  and  human  resource  development. 

Research: 

Though  the  original  proposal  of  the  project  was  written  in  the  context  of  social  networks,  we 
found,  during  the  performance  period  of  this  project,  that  the  fundamental  methodological 
development  is  needed  in  and  also  applicable  to  many  other  domains  with  network  data  such  as 
brain  networks  and  wireless  communication  networks  that  are  closely  related  to  the  Pi’s  past 
and  ongoing  research  experience. 

Aims  1  and  2:  We  studied  extensively  the  characteristics  of  the  networks  in  these  domains  and 
developed  three  new  statistical  models  for  modeling  the  underlying  dynamics  of  the  networks 
(Aim  2).  It  turned  out  that  in  two  of  the  three  models,  fusion  of  multi-dimensional  network  data  is 
just  a  special  case  of  dynamic  network  modeling,  i.e.,  by  replacing  the  time  index  with  the 
dimension  index  ( Aim  1).  Details  of  the  model  development  are  summarized  in  Appendix  A. 
Table  2  briefly  summarizes  the  statistical  formulation  and  solution  algorithms  of  the  model 
development  as  well  as  the  properties  for  each  model. 

Table  2:  Statistical  formulation,  solution  algorithms,  and  properties  of  three  proposed  dynamic 

models  for  network  data 


Proposed 

Models 

Statistical  formulation  and  solution 
algorithms 

Properties 

(1)  A  kernel 

smoothing 

estimator 

The  network  at  time  t  is  modeled  by  a 
Gaussian  Process  (GP).  The  likelihoods  of 
network  data  at  different  time  stamps  are 
combined  by  kernel  weights.  A  Bayesian 
posterior  estimate  for  the  GP  generates  a 
smooth  estimator  for  the  network  evolution 
over  time.  A  computationally  efficient 
approximation  algorithm  based  on 
exponential-propagation  (EP)  was 
developed  to  solve  the  Bayesian  posterior 
estimation. 

The  underlying  assumption  of  this  model  is 
that  the  GP  is  smooth  in  time,  i.e.,  its  1st  and 
2nd  order  derivatives  w.r.t.  time  are  upper- 
bounded  by  a  finite  constant.  So  this  model 
is  applicable  to  networks  that  evolve 
smoothly  in  time.  It  can  easily  incorporate 
attribute  and  multi-type,  multi-dimension 
relational  data  into  the  modeling.  The 
limitation  is  the  difficulty  in  checking  the 
smoothness  assumption  holds. 

(2)  A  state- 

space 

framework 

The  networks  at  different  time  stamps  are 
modeled  by  state  vectors  that  are  related  to 
each  other  by  a  Markov  dynamic  model. 

The  network  data  at  each  time  stamp  is 
related  to  the  state  vector  at  that  time 
through  a  probability  distribution  that  can 
be  approximated  to  a  Gaussian  distribution 
by  EP.  An  algorithm  similar  to  optimal 
Bayesian  filtering  was  developed  to  solve 
the  state  vector  estimation.  This  model  is 

This  model  explicitly  characterizes  the 
network  dynamics  by  putting  a  state  space 
model  on  the  model  parameters.  This 
provides  a  clean  interpretation  for  the 
dynamics  and  also  makes  it  easier  to  model 
different  kinds  of  anomalies.  It  can  easily 
incorporate  attribute  and  multi-type,  multi¬ 
dimension  relational  data  into  the  modeling. 
The  downside  may  be  the  computational 
cost  of  the  solution  algorithm  since  it  is 

similar  to  the  state  space  model  in  the 
literature  but  extended  to  a  network  setting. 

executed  recursively. 

(3)  A 
spectral 
evolution 
model 

Spectral  decomposition  of  networks  at 
each  time  stamp  is  done  with  a  constraint 
that  the  eigenvectors  for  different  time 
stamps  are  related  by  a  transformation 
matrix  to  cope  with  the  translational, 
rotational  and  scaling  effects.  An 
optimization  algorithm  was  developed  to 
estimate  the  eigenvalues  and  eigenvectors 
at  different  time  stamps. 

This  model  characterizes  the  spectral 
evolution  of  the  networks,  i.e.,  the  evolution 
in  the  eigen-space  not  the  original  space.  So 
it  provides  a  different  but  complementary 
perspective  to  the  other  two  proposed 
models.  This  has  clear  interpretation  that  the 
eigenvalues  capture  how  the  density  of  the 
networks  evolves  and  the  eigenvectors 
capture  how  the  structure/connectivity  of  the 
networks  evolves.  This  model  cannot  easily 
incorporate  attribute  data.  Efficient 
algorithms  for  spectral  decomposition  of 
large  networks  are  yet  to  be  developed. 

Aim  3:  Based  on  the  proposed  dynamic  models,  we  further  developed  methods  for  detecting 
abnormal  changes. 

Ongoing  research :  We  believe  that  we  have  accomplished  the  most  challenging  part  -  model 
development,  during  the  nine-month  project  performance  period.  We  are  currently  doing 
extensive  simulation  experiments  to  evaluate  and  compare  the  three  proposed  models.  We 
have  also  found  a  very  good  resource  of  publicly  available  large  network  data  repository, 
http://snap.stanford.edu/data/index.html.  We  plan  to  apply  our  models  to  these  data  and 
compare  the  results  with  published  literature  on  the  same  data.  We  plan  to  present  the  research 
results  in  INFORMS  2014  Annual  Conference  and  also  submit  a  journal  paper. 

Collaboration: 

In  a  mixer  event  to  promote  academia-defense  industry  collaboration,  the  PI  presented  some  of 
the  research  results,  which  drew  great  attention  from  US  Army  Electronic  Proving  Ground 
(EPG).  Later,  the  PI  developed  a  collaborative  project  with  EPG  for  modeling  large  mobile 
communication  networks  and  monitoring  and  detecting  poor  quality  of  service  (QoS)  and 
potential  adversarial  intrusion.  Mobile  communication  networks  are  commonly  deployed  in 
battlefields  for  soldiers  to  communicate  with  each  other  and  with  the  commander  for  situation 
awareness  and  mission  accomplishment.  The  QoS  and  security  of  the  networks  are  crucial. 


Education  and  human  resource  development: 

Change  detection  or  SPC  for  network  data  has  not  been  traditionally  taught  in  industrial 
engineering  classes.  With  the  abundance  of  network  data  in  various  application  domains,  there 
is  an  urgent  need  to  introduce  new  methodologies  for  network  data  modeling  and  monitoring  to 
students.  The  PI  has  incorporated  part  of  the  research  results  into  the  teaching  materials  of  her 
advanced  SPC  class.  In  addition,  the  PI  was  successfully  promoted  to  Associate  Professor  with 


tenure,  for  which  this  grant  served  as  one  honorable  recognition.  The  Ph.D.  student  supported 
by  this  grant  also  made  very  good  progress  toward  his  thesis. 


Appendix  A:  Details  for  the  Statistical  Formulation  and  Solution  Algorithms  for 

Three  Proposed  Dynamic  Models 

We  propose  three  different  methods  for  modeling  network  dynamics. 

Assume  that  there  are  N  nodes  in  the  network.  Denote  the  nodes  by  xv  ...,xN.  Each  node  is 
associated  with  a  vector  of  attributes.  Let  at  denote  the  vector  of  attributes  for  node  xt.  Let  E(t) 
be  the  observed  network/relational  data  of  the  nodes  at  time  t.  E(t)  is  an  N  x  N  matrix.  £,y 
denote  a  link  between  nodes  xt  and  xj  in  E(t).  The  value  of  sf?  can  be  0/1  to  represent  the 
existence  of  a  relation  or  a  numerical  number  to  represent  the  strength  of  a  relation. 


Model  1:  A  kernel-smoothing  estimator  for  latent  dynamic  networks 


Formulation'.  We  can  model  the  latent  network  dynamics  by  a  time-varying  Gaussian  Process 
(GP).  Let  f(t)  =  (jf\  be  random  variables  corresponding  to  nodes  xlt  ...,xN  at  time  t. 

The  prior  distribution  of  f(t)  is  assumed  to  be  a  zero-mean  GP  whose  covariance  matrix  is 

defined  by  the  attribute  data,  i.e.,  p(f(t))  =  GP(0,  Ka)  and  Ka(x;,x/)  =  exp(-l^f)- The 

likelihood  function  of  link  £®  is  given  as  follows  [1]: 


p  (sff  =  i|  st,  Sj)  =  f1  lf  (ft  )  +  Si)  W }  +  sj) 

lo  o.w 


>  0 


8i~N(0,  ct|)  denotes  measurement  errors.  Though  0/1 -type  network  data  is  considered  here,  we 
will  show  later  that  this  framework  allows  incorporation  of  numerical-valued  network  data. 
Integrating  out  8t  and  Sj, 


(1) 


O(-)  is  the  cumulative  density  function  of  the  standard  Gaussian  distribution.  Assuming  the  links 
are  independent,  the  likelihood  function  of  the  network  is  p(E<;t)|f(t)j  =  fl;  ,/P  (ff 


Consider  that  the  distribution  of  f(t)  changes  smoothly  over  time.  Then,  the  joint  likelihood  of  the 
networks  at  all  times  can  be  written  as  a  weighted  combination  of  individual  likelihoods  and  the 
weights  are  kernel  functions  of  time,  i.e., 


logp(E^1^,  ...,E^|f^)  =  logp(E^|f^), 


where  wAs,t)  =  k  (^~)/2s=ik  According  to  this  definition,  the  weight  is  the  largest  when 

s  =  t  and  decreases  as  s  is  away  from  t.  Using  this  joint  likelihood  function,  the  posterior  of  fW 


is 


(2) 


p(fW|E« . EW)  =p(fW)xnLiP(E(s)|f(t))' 


l/fet) 


Estimation :  The  exact  posterior  distribution  in  this  form  is  non-parametric.  Though 
computational  methods  like  MCMC  can  be  used  to  sample  from  the  exact  posterior  distribution, 
it  is  computationally  too  intense  for  large  networks.  We  propose  an  approximation  method 
based  on  expectation-propagation  (EP)  [2]  to  obtain  an  approximate  Gaussian  distribution  for 

the  exact  posterior  distribution.  Specifically,  we  approximate  v{p\f  by  a  Gaussian 


distribution  Ppexp 


/ 

rf(t)i 

T 

iy(Oi 

H 

h 

At) 
-J]  - 

nf 

h 

At) 

Jj  - 

where  r-P 


and  Ilf/0 


are  successively  optimized  by 


locally  minimizing  the  K-L  divergence.  At  the  equilibrium  the  EP  algorithm  returns  a  Gaussian 
approximation  to  p(E(s)|f(t))~N  (o,  n(s)  where  n(s)  =  is  nj^  augmented  to  an 

N  x  N  matrix.  Inserting  this  approximate  distribution  to  (2),  the  posterior  distribution  of  f(t) 
becomes  p(f^|E(:i),  ...,Em)~JV  (o,  (K^1  +  £s=i  w^s,t)  n^)  That  is,  the  inverse  covariance 

matrix  of  the  approximate  posterior  Gaussian  distribution  is  a  sum  of  the  prior  inverse 
covariance  matrix  and  a  kernel-smoothing  estimator  of  the  inverse  covariance  matrix  based  on 
network  data. 


As  a  more  efficient  approach,  n(s)  can  be  obtained  from  the  graph  Laplacian  matrix  of  the 
network.  In  this  way,  numerical-valued  network  data  can  also  be  incorporated  using  a  weighted 
graph  Laplacian  matrix. 

In  this  way,  the  latent  dynamic  networks  can  be  characterized  by  the  posterior  inverse 
covariance  matrix,  K^1  +  Es=i w(s,t)  n(s),  t  =  1,  ...T.  Note  that  if  T  =  t,  then  it  is  an  estimation 
problem;  if  T  <  t,  it  is  a  prediction  problem;  if  T  >  t,  it  is  an  smoothing  problem. 

Extension :  If  there  are  multi-dimension  networks  at  each  time  s,  the  p(E(s)|f(t))  in  (2)  can  be 
written  as  p(E(s,1), ... ,  E(S,M)  |f(t)),  where  M  denotes  the  number  of  networks  under  consideration. 

p(E(s'1), ... ,  e(s,m) |f (t))  =  rim=i  p(E(s,m)  |f (t))  .  Then,  all  pervious  derivations  apply  naturally. 

Model  2:  A  state-space  framework  for  latent  dynamic  networks 

Formulation :  In  this  formulation,  the  dynamics  of  the  latent  networks  is  modeled  by  a  state- 
space  model,  i.e.,  p(f(t)|f(t_1),  ....fC1)).  Under  the  simplest  scenario  when  the  Markov  property 
holds,  p( f |f ... ,  f (1))  =  p(f |f Assuming  a  linear  Gaussian  state-space  model, 


f  CO  =  A(t-i)f(t-i)  +  q(t-i)j 


q(t  1)~n(q>  1)).  The  measurement  model  is  the  same  as  (1 ),  i.e., 


p(E(t)|fW)  =  nuP(£® 


(3) 


11  |  P  a  g  e 


Estimation :  Following  the  Bayesian  optimal  filtering  theory  [3],  the  estimation  for  f(t)  can  be 
conducted  recursively  as  follows: 

Initialization:  p(f(0))~GP(0,  Ka)  defined  by  the  attribute  data 

Prediction:  the  predicted  distribution  of  f  W  given  data  from  time  s  =  1, ... ,  t  -  1  is: 

p(f®  |E(1:t_1))  =  Jp(f®  |f(t_1))p(f(t_1)|E(1:t_1))  (4) 

Update:  given  the  data  at  time  t,  the  posterior  distribution  of  can  be  computed  by  the  Bayes’ 
rule: 

p(f(t)|E(1:t))  =  lp(f(t)|E(1:t-1))  x  p(E«|fW),  (5) 

Zt 

Zt  is  a  normalization  constant. 

We  can  use  the  EP  algorithm  to  approximate  the  distribution  in  (3)  by 
p(EW\fW)~N  (f®|0,nW_1).  Letp(fW|E(1:ti)  =  N( f«|0,P«).  Then, 

p(f (t) |E(1:t-D)  =  N  (f (0 10,  a^-^P^-Da^-1)7,  +  QC"1)). 

Furthermore,  p(f(t)|E(1:t))  oc  N  (fW|0,  N  (f(t) |0,  n(t)  = 

iv^t)|o,|{A(t-1)p(t-1)A(t-1)r  +  Q(t-1)}  1  +  n®J  y 

Consider  a  special  case  when  A(t_1)  =  I  and  =  <72I.  Then,  the  prediction  and  update  in 

(4)  and  (5)  become: 

p(f(t)|E(1:t-1))  =  7V(f  Ct)  |o,  pCt-i)  +  CT2j^  and 
p(f(t)|E(i:t))  =  N  (f(t)|0,{{p(*-i)  +  erg  i}_1  +  nwj'1) . 

Write  out  a  few  time  stamps: 

at  t  =  1,  p(fW|E(1:°))  =  p(f(°))~GP(0,Kfl)  and  p(f(1)|E(1:1))  =  N  (f«|0,  (K*1  +  nW)'1) 

at  t  =  2,  p(f(2)|E(1:1))  =  iV  (f^|0,  (K"1  +  nW)'1  +  cr2l)  and  p( f C2) |eC1:2>)  =  N  ^f(2)|0,  |{(K“1  + 
nW)-1  +  cr2i}_1  +  n®} 

If  static  networks  are  assumed,  i.e,  a2  =  0.  Then,  p(f(t)|E(1:t_1))  =  yv  (K"1  +  n(1)  H - h 

na-i))-1)  and  p(f(t)|E(1:t))  =  ^(f^lo^Kfl1  +  n«  +  •••  +  n®)_1). 


If  A(t  ^  =  A,  ArA  =  I,  and  er|  =  0,  then,  p(f®|E(1:t  1))  =  A(f®|0,  AP(t  1)A7’)  and 
p(f(t)|E(1:t))  =  w(f®|0,{AP(®1)_1A7’  +  n®} 

Extension :  If  there  are  multi-dimension  networks  at  each  time  t,  the  p(E®|f®)  in  (5)  can  be 
written  as  p(E(t'1), E®M)|f®),  where  M  denotes  the  number  of  networks  under  consideration. 

p(E(t,1), ... ,  E  (t,w)|f(t))  =  nJn=i p(E®m)  |f  ®  )  .  Then,  all  pervious  derivations  apply  naturally. 

Model  3:  A  spectral  evolution  model  for  latent  dynamic  networks 

Formulation :  A  spectral  decomposition  for  a  network  is  E®  =  U(t)D(t)U®  ,  where  D®  = 
diag(d1  ,d2 ,  ...,dK)  is  a  diagonal  matrix  consisting  of  K  non-zero  eigenvalues  of  E®  and 
U®  =  [u®,  ...,u®]  consist  of  eigenvectors,  K  <  N.  For  noisy  network  data,  D®  can  consist  the 

first  K  significantly  large  eigenvalues.  In  network  spectral  analysis  [4],  it  has  been  found  that  D® 
characterizes  the  density  of  the  network  and  U®  characterizes  the  structure  of  the  network. 


In  conventional  network  spectral  analysis,  it  assumes  that  there  is  a  “constant”  U  underlying  the 
network  evolution.  To  extend  it,  our  formulation  allows  that  U®  =  UB®  where  B®  is  the 
transformation  matrix  to  cope  with  the  translational,  rotational  and  scaling  effects.  Given  the 
network  data  E®, ...,  E(T),  modeling  the  network  dynamics  can  be  formulated  into  the  following 
optimization  problem: 


minU«,DW,BW,U,t=ll . T  {Et=l  ||E(t)  -  U®  D®U®T  J, 

Subject  to  (U®)TU®  =  1,  U®  =  UB®  for  t  =  1,2,  ...,T. 
Estimation :  It  is  not  hard  to  derive  that  (6)  is  equivalent  to  the  following: 


(6) 


mm, 


Dtt).c(0.„i . T  {S-1 1 E<t)  -  uMdCOum1'  IQ 


Subject  to  (U®)TU®  =  1,  U®  =  U(t_1)C®  for  t  =  1,2, ... ,  T.  (7) 

Because  consists  of  K  eigenvectors  orthogonal  to  each  other,  solving  the  optimization  in  (7) 
can  be  done  sequentially  for  each  eigenvector  and  recursively  along  the  time,  i.e., 


mindM„M 


E®  -  d®u®u®T 


2 

F 


Subject  to  u®Tu®  =  1,  u®  =  U(t“1}c®  (8) 

Start  from  E®  =  E(t).  After  u(t)  and  d®  are  obtained  by  solving  (8),  E®  =  E®  - 

/v  T 

d®  u®u(t)  and  (8)  is  solved  again  to  obtain  the  next  pair  of  eigenvalue  and  eigenvector. 
is  treated  as  constant,  i.e.,  the  estimated  eigenvector  matrix  for  the  previous  time  stamp. 
Though  the  proof  is  omitted  here,  we  have  derived  that  the  first  pair  of  eigenvalue  and 


eigenvector  can  be  obtained  by  the  largest  eigenvalue  and  corresponding  eigenvector  for  matrix 

E(t)ij(t-i)  ^u(t-i)7’u(t-i)j  1  u(t-1)T(E(t))T. 


Appendix  B:  Details  for  Change  Detection  based  on  the  Three  Proposed  Dynamic 

Models 


Change  detection  based  on  model  1:  Let  E(1\...,Em  be  the  dynamic  networks  during  the 
normal  time  period  t  =  1,  Let  t*  be  a  future  time  at  which  we  want  to  determine  if  there  is 
an  abnormal  change  occurring.  Based  on  model  1  described  in  Appendix  A,  we  can  derive  a 
predicted  distribution  for  the  network  at  time  t*  from  the  dynamic  evolving  trajectory  during  the 
normal  time  period,  i.e., 

p(f(t*)|E(1) . E^)~N  (o,  (K-1  +  £s=i  w<s'£,)  n^)"1). 

It  can  be  further  derived  that  the  \fj(t  ]|  associated  with  nodes  xt  and  x;-  follows  a 

K*(X;,Xi)  K*(x;,x;-)' 

K*(X;,X;-)  K*(x;-,X;) 

k(x;,x;)  —  KiT(l  +  Ka  SLiwAOnOO)  n^Kj.  k(x;,x/)  is  the  element  at  the  i-th 

row  and  ;-th  column  of  Ka  .  k;  is  a  column  vector  corresponding  to  the  i-th  column  of  Ka  . 


K*(Xi,X;-)  = 


Gaussian  distribution  ~N(0,E-;),  where 


Then,  the  predicted  distribution  for  a  link  between  xt  and  x;-  at  time  t*  is: 

-iT 


r(t’) 


p {eij >)|  E(1) . Em)  =  J p  (slplfP.fP)  N (\fln,fp]  1 0Xtj) dfPdft 

(  1  arcsin(r£(it- 

which  can  be  derived  to  be  equal  to  v\^\j  E(1), E(T)J  =  _  - A — LJ.t 


where  r  = 


K  *{pCi,Xj) 


=.  Once  the  network  at  time  t*  is  observed,  the  probability  of  the  network 


I  K*(Xj,Xj)K*  (Xj,Xj) 

according  to  the  normal  evolving  trajectory  can  be  computed  as  riiy  P  )  E^,  ...,Em). 

Though  more  sophisticated  change  detection  rules  are  to  be  developed,  an  intuitive  and 
practically  good  approach  is  to  compare  this  probability  with  that  of  a  random  network,  i.e.,  the 

probability  of  a  link  existing  between  any  two  nodes  is  0.5.  If  fl i;p(^iy  )  ...,Em)  is  bigger 

than  a  random  network,  then  it  means  that  there  is  some  kind  of  abnormal  change  at  time  t*. 


Change  detection  based  on  model  2:  Because  model  2  also  produces  a  predicted  distribution 
for  the  network  at  a  future  time  t*,  a  similar  approach  to  the  change  detection  method  based  on 
model  1  can  be  applied.  The  details  are  omitted  here  because  only  minimum  modification  is 
needed. 


Change  detection  based  on  model  3:  We  employ  the  recently  developed  distribution-free 
control  chart,  the  Reproducing  Kernel  Hilbert  Space  (RKHS)-EWMA  chart  [5],  for  monitoring  U 
and  D.  Specifically,  in  what  follows,  we  illustrate  how  the  RKHS-EWMA  chart  can  be  employed 


for  monitoring  D,  while  the  monitoring  of  U  follows  similar  procedure.  The  estimated 
...,U(r)}  and  {D(1),  D(2), D(T)}  from  the  in-control  training  data  {E(1),E(2), 
provides  an  in-control  library  of  the  normal  states  of  the  process.  We  denote  the  in-control 
process  distribution  as  Pic  (x)  where  ...,D(,r)}  are  its  samples.  In  the  process 

monitoring  phase,  individual  network  data  will  be  continuously  collected,  denoted  as 
e(t+1),  e(t+2),  ...,  E(t+w).  Then,  we  can  estimate  the  D(T+1),D(:’r+2),  ...,D(T+W)  and 
uC7'+1)<u(r+2) . u(T+w)  by  solving  (8). 

The  RKHS-EWMA  chart  is  employed  to  detect  the  process  change  in  D(7’+1),D(T+2),  ...,D(:’r+w) 
by  comparing  them  with  the  in-control  library  D(2), D(T)}.  Formerly,  the  process  model 
under  study  can  be  written  as 

D(r+n)  (  PicW.forn  =  1,2,  ...,r 

\Poc(x),for  n  =  r  +  l,r  +  2, ...’ 


Here,  r  is  the  time  point  when  a  process  change  occurs,  and  Poc  (x)  is  the  out-of-control 
process  distribution.  We  let  Pt{x)  denote  the  process  distribution  at  time  point  t,  i.e.,  Pt(x) 
equals  to  Pic  (x)  when  t  <  r,  and  Pt(x)  equals  to  Poc  (x)  when  t  >  r.  The  essential  component  of 
process  monitoring  is  the  computation  of  the  distance  between  two  distributions,  D{PIC,Pt).  An 
out-of-control  signal  will  be  triggered  if  the  estimated  D(PIC,Pt )  is  statistically  different  from  zero. 
In  the  estimation  of  D(PIC,Pt),  many  conventional  SPC  methods  first  estimate  Pic  and  Pt  in 
either  parametric  or  nonparametric  (or  distribution-free)  way.  However,  the  estimation  of  high¬ 
dimensional  distributions  has  been  recognized  as  an  ill-posed  inverse  problem,  which  is 
generally  very  difficult  to  solve  and  needs  a  very  large  number  of  samples  to  guarantee  an 
acceptable  estimation  uncertainty,  in  order  to  not  overwhelm  the  subsequent  estimate  of  the 
distance.  In  contrast  to  many  conventional  SPC  methods,  the  RKHS-EWMA  chart  estimates 
D(P,C,  Pt)  directly,  by  uniquely  embedding  Pic  and  Pt  in  the  RKHS.  Details  of  how  to  implement 
the  RKHS-EWMA  chart  can  be  found  in  [5]. 
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